5 research outputs found

    A survey of feature selection in Internet traffic characterization

    Get PDF
    In the last decade, the research community has focused on new classification methods that rely on statistical characteristics of Internet traffic, instead of pre-viously popular port-number-based or payload-based methods, which are under even bigger constrictions. Some research works based on statistical characteristics generated large fea-ture sets of Internet traffic; however, nowadays it?s impossible to handle hun-dreds of features in big data scenarios, only leading to unacceptable processing time and misleading classification results due to redundant and correlative data. As a consequence, a feature selection procedure is essential in the process of Internet traffic characterization. In this paper a survey of feature selection methods is presented: feature selection frameworks are introduced, and differ-ent categories of methods are briefly explained and compared; several proposals on feature selection in Internet traffic characterization are shown; finally, future application of feature selection to a concrete project is proposed

    Brief Announcement: Node Sampling Using Centrifugal Random Walks.

    Get PDF
    We propose distributed algorithms for sampling networks based on a new class of random walks that we call Centrifugal Random Walks (CRW). A CRW is a random walk that starts at a source and always moves away from it. We propose CRW algorithms for connected networks with arbitrary probability distributions, and for grids and networks with regular concentric connectivity with distance based distributions. All CRW sampling algorithms select a node with the exact probability distribution, do not need warm-up, and end in a number of hops bounded by the network diameter

    A telecom analytics framework for dynamic quality of service management

    Get PDF
    Since the beginning of Internet, Internet Service Providers (ISP) have seen the need of giving to users? traffic different treatments defined by agree- ments between ISP and customers. This procedure, known as Quality of Service Management, has not much changed in the last years (DiffServ and Deep Pack-et Inspection have been the most chosen mechanisms). However, the incremen-tal growth of Internet users and services jointly with the application of recent Ma- chine Learning techniques, open up the possibility of going one step for-ward in the smart management of network traffic. In this paper, we first make a survey of current tools and techniques for QoS Management. Then we intro-duce clustering and classifying Machine Learning techniques for traffic charac-terization and the concept of Quality of Experience. Finally, with all these com-ponents, we present a brand new framework that will manage in a smart way Quality of Service in a telecom Big Data based scenario, both for mobile and fixed communications

    Regularized greedy column subset selection

    No full text
    The Column Subset Selection Problem is a hard combinatorial optimization problem that provides a natural framework for unsupervised feature selection, and there exist efficient algorithms that provide good approximations. The drawback of the problem formulation is that it incorporates no form of regularization, and is therefore very sensitive to noise when presented with scarce data. In this paper we propose a regularized formulation of this problem, and derive a correct greedy algorithm that is similar in efficiency to existing greedy methods for the unregularized problem. We study its adequacy for feature selection and propose suitable formulations. Additionally, we derive a lower bound for the error of the proposed problems. Through various numerical experiments on real and synthetic data, we demonstrate the significantly increased robustness and stability of our method, as well as the improved conditioning of its output, all while remaining efficient for practical use

    Construcción de redes de pequeño mundo mediante selección sesgada.

    Full text link
    En la actualidad las redes de Pequeño Mundo están presentes en muchas aplicaciones distribuidas, pudiéndose construir estas redes añadiendo, a un grafo base, enlaces de largo alcance tomados conforme a una determinada distribución de probabiblidad. Los sistemas distribuidos actuales utilizan soluciones ad hoc específicas para calcular los enlaces de largo alcance. En este artículo proponemos un nuevo algoritmo distribuido llamado Selección Sesgada (SS), que utilizando únicamente un servicio de muestreo uniforme (que puede estar implementado mediante un protocolo gossip), es capaz de seleccionar enlaces largos conforme a cualquier distribución de probabilidad. SS es un algoritmo iterativo que dispone de un único parámetro (r) para indicar el número de iteraciones que debe ejecutarse. Se ha probado que la muestra obtenida con el algoritmo SS converge a la distribución objetivo a medida que aumenta el valor de r. También se ha calculado la cota analítica del error relativo máximo, para un determinado valor de r. Aunque este artículo se propone para el algoritmo SS como una herramienta para tomar muestras de nodos en una red, puede emplearse en cualquier contexto en el que sea necesario realizar un muestreo conforme a una determinada distribución de probabilidad, necesitando para funcionar únicamente un servicio de muestreo uniforme. Se han construido redes de Pequeño Mundo, modelo Kleinberg, utilizando SS para escoger los enlaces (vecinos) de largo alcance en estructuras de tipo toro. Hemos observado que con un número reducido de iteraciones (1) SS tiene un comportamiento muy similar a la distribución armónica de Kleinberg y (2) el número medio de saltos, utilizando enrutamiento ávido, no es peor que en una red construida con la distribución de Leinberg. También se ha observado que antes de obtener la convergencia, el número medio de saltos es menor que en las redes construidas mediante la distribución armónica de Leinberg (14% mejor en un toro de 1000 x 1000)
    corecore